Inter-rater agreement in evaluation of disability: systematic review of reproducibility studies
نویسندگان
چکیده
OBJECTIVES To explore agreement among healthcare professionals assessing eligibility for work disability benefits. DESIGN Systematic review and narrative synthesis of reproducibility studies. DATA SOURCES Medline, Embase, and PsycINFO searched up to 16 March 2016, without language restrictions, and review of bibliographies of included studies. ELIGIBILITY CRITERIA Observational studies investigating reproducibility among healthcare professionals performing disability evaluations using a global rating of working capacity and reporting inter-rater reliability by a statistical measure or descriptively. Studies could be conducted in insurance settings, where decisions on ability to work include normative judgments based on legal considerations, or in research settings, where decisions on ability to work disregard normative considerations. : Teams of paired reviewers identified eligible studies, appraised their methodological quality and generalisability, and abstracted results with pretested forms. As heterogeneity of research designs and findings impeded a quantitative analysis, a descriptive synthesis stratified by setting (insurance or research) was performed. RESULTS From 4562 references, 101 full text articles were reviewed. Of these, 16 studies conducted in an insurance setting and seven in a research setting, performed in 12 countries, met the inclusion criteria. Studies in the insurance setting were conducted with medical experts assessing claimants who were actual disability claimants or played by actors, hypothetical cases, or short written scenarios. Conditions were mental (n=6, 38%), musculoskeletal (n=4, 25%), or mixed (n=6, 38%). Applicability of findings from studies conducted in an insurance setting to real life evaluations ranged from generalisable (n=7, 44%) and probably generalisable (n=3, 19%) to probably not generalisable (n=6, 37%). Median inter-rater reliability among experts was 0.45 (range intraclass correlation coefficient 0.86 to κ-0.10). Inter-rater reliability was poor in six studies (37%) and excellent in only two (13%). This contrasts with studies conducted in the research setting, where the median inter-rater reliability was 0.76 (range 0.91-0.53), and 71% (5/7) studies achieved excellent inter-rater reliability. Reliability between assessing professionals was higher when the evaluation was guided by a standardised instrument (23 studies, P=0.006). No such association was detected for subjective or chronic health conditions or the studies' generalisability to real world evaluation of disability (P=0.46, 0.45, and 0.65, respectively). CONCLUSIONS Despite their common use and far reaching consequences for workers claiming disabling injury or illness, research on the reliability of medical evaluations of disability for work is limited and indicates high variation in judgments among assessing professionals. Standardising the evaluation process could improve reliability. Development and testing of instruments and structured approaches to improve reliability in evaluation of disability are urgently needed.
منابع مشابه
Critical Review:
This critical review examines the inter-rater reliability of two clinical feeding assessments of infant oral sensorimotor function, the Neonatal Oral-Motor Scale (NOMAS) and the Preterm Infant Breastfeeding Behaviour Scale (PIBBS). Study designs include four diagnostic test studies and one systematic review. Tests of inter-rater reliability for the PIBBS resulted in acceptable agreement between...
متن کاملTest-Retest and Inter-Rater Reliability Study of the Schedule for Oral-Motor Assessment in Persian Children
Objectives: Reliable and valid clinical tools to screen, diagnose, and describe eating functions and dysphagia in children are highly warranted. Today most specialists are aware of the role of assessment scales in the treatment of affected individuals. However, the problem is that the clinical tools used might be nonstandard, and worldwide, there is no integrated assessment performed to assess ...
متن کاملSome Notes on Critical Appraisal of Prevalence Studies; Comment on: “The Development of a Critical Appraisal Tool for Use in Systematic Reviews Addressing Questions of Prevalence”
Decisions in healthcare should be based on information obtained according to the principles of Evidence-Based Medicine (EBM). An increasing number of systematic reviews are published which summarize the results of prevalence studies. Interpretation of the results of these reviews should be accompanied by an appraisal of the methodological quality of the included data and studies. The critical a...
متن کاملPsychometric evaluation of Stanford health assessment questionnaire 8-item disability index (HAQ 8-item DI) in elderly people
Introduction: Measuring level of disability in elderly in attention to their physical status requires an instrument practicable in elderly and easy to score. Objective: Survey the reliability and validity of the HAQ 8-item DI in older people residing in Kashan Golabchi nursing home. Methods: In this methodological study, samples were chosen by census...
متن کاملFunctional Movement Screen in Elite Boy Basketball Players: A Reliability Study
Purpose: To investigate the reliability of Functional Movement Screen (FMS) in basketball players. A few studies have compared the reliability of FMS between raters with different experience in athletes. The purpose of this study was to compare the FMS scoring between the beginners and expert raters using video records. Methods: This is a cross-sectional study. The study subjects compris...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 356 شماره
صفحات -
تاریخ انتشار 2017